Computational model predicts protein binding sites of a luminescent ligand equipped with guanidiniocarbonyl-pyrrole groups

The 14-3-3 protein family, one of the first discovered phosphoserine/phosphothreonine binding proteins, has attracted interest not only because of its important role in the cell regulatory processes but also due to its enormous number of interactions with other proteins. Here, we use a computational approach to predict the binding sites of the designed hybrid compound featuring aggregation-induced emission luminophores as a potential supramolecular ligand for 14-3-3ζ in the presence and absence of C-Raf peptides. Our results suggest that the area above and below the central pore of the dimeric 14-3-3ζ protein is the most probable binding site for the ligand. Moreover, we predict that the position of the ligand is sensitive to the presence of phosphorylated C-Raf peptides. With a series of experiments, we confirmed the computational prediction of two C2 related, dominating binding sites on 14-3-3ζ that may bind to two of the supramolecular ligand molecules.

S2 distance between GCP and HYD was obtained by tracking the position of C1 atom 1 in HYD and the central guanidinium carbon in GCP. We used these time series to calculate a probability distribution, whose logarithm gave us energy. Given a sufficiently long MD simulation of QQJ-096 in water, the three energy-distance plots ( Figure S2) should be identical due to the threefold symmetry of the ligand. The difference we observe is attributed to under sampling.
To improve the estimation of the two spring constants, we calculated the average of the three fitted polynomials which is a polynomial itself. For the spring connecting lysine and GCP we  We performed 4000 SAMC simulations on QQJ-096/14-3-3/c-Raf using the same protocol described in the "Experimental" section of the main manuscript. According to our results, both end groups show high affinity for the central cleft of 14-3-3/c-Raf ( Figure S3, second and third rows) which is in line with the previously reported results [1]. In the previous study, the central S3 pore of the protein was reported as the highly probable binding site, with a spatial distribution of the most favorable sites in agreement with the most frequent salt bridge partners observed in all-atom MD simulations (14-3-3 Glu14, 14-3-3 Glu17, C-terminal c-Raf fragment His11).
In addition, our SAMC simulations make it possible to investigate the most probable binding region of HYD ( Figure S3, first row) which was not done before.

. Electrostatic Potential Surface of 14-3-3
The electrostatic potential surface of the protein in the presence and absence of C-Raf peptides is illustrated in Figure S4.

Clustering
Applying a clustering method to the final positions of 1 provides a sanity check for our simulations. Considering the fact that 14-3-3 is a homodimer with C2 symmetry, we expect for each binding site (or cluster of binding sites) that does not lie on the symmetry axis a symmetry related binding site (or cluster).
We use a robust silhouette-validated PAM (partitioning around medoids) clustering for this purpose [2]. Optimal number of clusters is extracted and visualized using the fviz_nbclust tool from R package factoextra version 1.0.7, and clustering is done using the pam tool from R package cluster version 2.1.0, in R version 3.5.3 [3].
The optimal number of clusters is obtained for two series of simulations (in the presence and absence of C-Raf peptides) using the Silhouette average method. The clustered data points obtained from each series of simulations are then mapped in 2D using their principal components as is illustrated in Figures S6 and S7.

General information and instrumentation
The used solvents for the synthesis and purification were dried and distilled before use. The purification of the water was performed with a TKA MicroPure ultrapure water system. The used chemicals for the synthesis were bought from commercial sources and used without any

Synthetic routes
Compound A [4] and the active ester C [5] were synthesized as described in the literature before. The activated azide F (tosylethylazide) can be prepared in analogy to a known two-step synthetic route [6]. The spectra obtained were in accordance with those reported in the literature.

Native gel electrophoresis
For native gel electrophoresis, a discontinuous polyacrylamide gel was prepared (1.5 mm, 10 wells) using 12.5% (w/v) acrylamide for the separation gel and 4% (w/v) acrylamide for the stacking gel (Table S1). The experimental setup (Table S2) included a 1 mM stock solution of 14-3-3 in native running buffer (Table S1) and 1 mM of ligand 1 dissolved in water. To avoid alteration in the protein structure caused by dilution with the ligand, we used 4x concentrated sample buffer to equalize the buffer conditions and added the protein last (Table S1). The samples were centrifuged at 10000 ×G for 1 Min (ThermoFisher) to spin down potential aggregates before they were loaded onto the gel. Negative controls contained either protein  (Table S1 and S2).
S23 Table S1: Composition of the gel, buffers and solutions.

Component Composition
Native stacking gel buffer(4**) 0.5 M tris-HCl, pH 6.8 Native separation gel buffer (4**)   In addition, the ligand in the well co-localized with the protein retained in the well. Of note, not the complete amount of protein present in the well was detectable.

Fluorescence / UV-vis titration and Job plot analysis
In all spectroscopic measurements, the following buffer was used to ensure correct folding and stability of the 14-3-3protein: 25 mM HEPES, 150 mM NaCl, 10 mM MgCl2, 0.5 mM TCEP, pH 6.5.
The fluorescence spectroscopy was performed in quartz cuvettes using a Shimadzu RF-6000 spectro fluorophotometer and UV-vis measurements were carried out on a Cary 300 Bio UV-Visible Spectrophotometer at room temperature. The excitation was monitored at 390 nm. The titration started with 5 M ligand in the cuvette and stepwise protein was added (up to 8 equiv), leading to an emission increase. Visible precipitation occurred at 7 equiv of 14-3-3 and the S26 fluorescence signal was not constant anymore. The emission maximum was found to be at ≈470 nm (blue emission). To avoid dilution of ligand 1 during the titration the protein stock solution contained 5 µM of ligand 1.
Concentration and time dependent measurements of 1, using UV-vis and fluorescence spectroscopy, showed a linear behavior in the concentration ranged used for the fluorescence titration and Job plot analysis. As consequence we assume no intra-or intermolecular stacking of the aromatic moieties in the concentration range used.
The analysis was performed in OriginPro 2019b with the quadratic equation:

S30
A Job's plot analysis was performed for more information about the ratio between ligand 1 and the 14-3-3monomer [11]. The total concentration used of ligand 1 was 10 µM. The same conditions (buffer, pH value, temperature) as for the fluorescence titration were chosen.  Figure S29: Job's plot of 14-3-3 and ligand 1 at λem = 470 nm. The maximum of 0.5 suggest a ratio of 1 ligand to 1 protein monomer or two ligands to a 14-3-3 dimer.